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A method of esfc^g^^g^the pitch of a speech signal using 
a binary signal, use of the method, and a device adapted 
therefor 



5 

The invention relates to a method of estimating the pitch 
of a speech signal, said method being of the type where 
the speech signal is divided into segments, a conformity 
function for the signal is calculated for each segment, 
10 and peaks in the conformity function are detected. The 
invention also relates to the use of the method in a mo- 
bile telephone. Further, the invention relates to a de- 
vice adapted to estimate the pitch of a speech signal. 

15 In many speech processing systems it is desirable to know 
the pitch period of the speech. As an example, several 
speech enhancement algorithms are dependent on having a 
correct estimate of the pitch period. One field of appli- 
cation where speech processing algorithms are widely used 

2 0 is in mobile telephones. 

A well known way of estimating the pitch period is to use 
the autocorrelation function, or a similar conformity 
function, on the speech signal. An example of such a 
25 method is described in the article D. A. Krubsack, R. J. 
Niederjohn, "An Autocorrelation Pitch Detector and Voic- 
ing Decision with Confidence Measures Developed for 
Noise -Corrupted Speech", IEEE Transactions on Signal 
Processing, vol. 39, no. 2, pp. 319-329, Febr. 1991. The 

3 0 speech signal is divided into segments of 51.2 ms, and 

the standard short-time autocorrelation function is cal- 
culated for each successive speech segment. A peak pick- 
ing algorithm is applied to the autocorrelation function 
of each segment. This algorithm starts by choosing the 
35 maximum peak (largest value) in the pitch range of 50 to 
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333 Hz. The period corresponding to this peak is selected 
as an estimate of the pitch period. 

However, such a basic pitch estimation algorithm is not 
5 sufficient. In some cases pitch doubling can occur, i.e. 
the highest peak appears at twice the pitch period. The 
highest peak may also appear at another multiple of the 
true pitch period. In these cases a simple selection of 
the maximum peak will provide a wrong estimate of the 
10 pitch period. 

The above-mentioned article also discloses a method of 
improving the algorithm in these situations. The algo- 
rithm checks for peaks at one-half, one- third, one- 

15 fourth, one- fifth, and one- sixth of the first estimate of 
the pitch period. If the half of the first estimate is 
within the pitch range, the maximum value of the autocor- 
relation within an interval around this half value is lo- 
cated. If this new peak is greater than one-half of the 

20 old peak, the new corresponding value replaces the old 
estimate, thus providing a new estimate which is presuma- 
bly corrected for the possibility of the pitch period 
doubling error. This test is performed again to check for 
double doubling errors (fourfold errors) . If this most 

25 recent test fails, a similar test is performed for tri- 
pling errors of this new estimate. This test checks for 
pitch period errors of sixfold. If the original test 
failed, the original estimate is tested (in a similar 
manner) for tripling errors and errors of fivefold. The 

30 final value is used to calculate the pitch estimate. 

However, this known algorithm is rather complex and re- 
quires a high number of calculations, and these drawbacks 
make it less usable in real time environments on small 
3 5 digital signal processors as they are used in mobile 
telephones and similar devices. 
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Thus, it is an object of the invention to provide a 
method of the above-mentioned type which is less complex 
than the prior art methods, such that the method is suit- 
5 able for small digital signal processors. 

According to the invention, this object is achieved in 
that the method further comprises the steps of providing 
an intermediate signal derived from the speech signal, 

10 converting the intermediate signal to a binary signal, 
which is set to logical "1" where the intermediate signal 
exceeds a pre -selected threshold and to logical "0" where 
the intermediate signal does not exceed the pre-selected 
threshold, calculating the autocorrelation of the binary 

15 signal, and using the distance between peaks in the auto- 
correlation of the binary signal as an estimate of the 
pitch. 

The calculation of the autocorrelation of the binary sig- 
20 nal takes only a fraction of the computational resources 
needed for the prior art algorithms . Since there are only 
values in some positions of the binary signal, the values 
of the resulting autocorrelation will occur around zero 
and around the pitch period of the speech signal, and 
25 there will only be a few values separated from zero. 
Thus, the pitch period can easily be estimated to the 
distance between the values at position zero and the val- 
ues separated from zero. The large amount of operations 
needed in prior art algorithms where a specific value has 
3 0 to be found in a vector of numbers is thus avoided. 

In one embodiment the intermediate signal may be provided 
by filtering the speech signal through a filter based on 
a set of filter parameters estimated by means of linear 
35 predictive analysis (LPA) . In this way much of the smear- 
ing of the original speech signal is removed. 
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Alternatively, the intermediate signal may be provided by 
calculating the autocorrelation of a signal derived from 
the speech signal by filtering the speech signal through 
5 a filter based on a set of filter parameters estimated by 
means of linear predictive analysis (LPA) . This solution 
also removes most of the smearing of the original speech 
signal, and further the possibility of clearer peaks in 
the intermediate signal is improved. 

10 

If the peak corresponding to the distance between the 
peaks is represented by a number of samples, the best es- 
timate is achieved when the sample having the maximum am- 
plitude of said conformity function is selected as the 
15 estimate of the pitch . 

In an expedient embodiment of the invention the method is 
used in a mobile telephone, which is a typical example of 
a device having only limited computational resources. 

20 

As mentioned, the invention further relates to a device 
adapted to estimate the pitch of a speech signal. The de- 
vice comprises means for sampling the speech signal to 
obtain a series of samples, means for dividing the series 
25 of samples into segments, each segment having a fixed 
number of consecutive samples, means for calculating for 
each segment a conformity function for the signal, and 
means for detecting peaks in the conformity function. 

3 0 When the device further comprises means for providing an 
intermediate signal derived from the speech signal, means 
for converting said intermediate signal to a binary sig- 
nal, said binary signal being set to logical "1" where 
the intermediate signal exceeds a pre-selected threshold 

35 and to logical "0" where the intermediate signal does not 
exceed the pre-selected threshold, means for calculating 
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the autocorrelation of the binary signal, and means for 
using the distance between peaks in the autocorrelation 
of the binary signal as an estimate of the pitch, a de- 
vice less complex than prior art devices is achieved, 
5 which also avoids the pitch halving situation. 

In one embodiment the device may be adapted to provide 
the intermediate signal by filtering the speech signal 
through a filter based on a set of filter parameters es- 
10 timated by means of linear predictive analysis (LPA) . In 
this way much of the smearing of the original speech sig- 
nal is removed. 

Alternatively, the device may be adapted to provide the 
15 intermediate signal by calculating the autocorrelation of 
a signal derived from the speech signal by filtering the 
speech signal through a filter based on a set of filter 
parameters estimated by means of linear predictive analy- 
sis (LPA) . This solution also removes most of the smear - 
20 ing of the original speech signal, and further the possi- 
bility of clearer peaks in the intermediate signal is im- 
proved. 

If the peak corresponding to the distance between the 
25 peaks is represented by a number of samples, the best es- 
timate is achieved when the device is adapted to select 
the sample having the maximum amplitude of said confor- 
mity function as the estimate of the pitch. 

3 0 In an expedient embodiment of the invention, the device 
is a mobile telephone, which is a typical example of a 
device having only limited computational resources. 

In another embodiment the device is an integrated circuit 
35 which can be used in different types of equipment. 



Printed:26-03-2001 



® 



The invention will now be described more fully below with 
reference to the drawing, in which 

figure 1 shows a block diagram of a pitch detector ac- 
cording to the invention, 

figure 2 shows the generation of a residual signal, 

figure 3a shows a 20 ms segment of a voiced speech sig- 
nal, 

figure 3b shows the autocorrelation function of a resid- 
ual signal corresponding to the segment of figure 3a, and 

figure 4 shows an example of an autocorrelation function 
where pitch doubling could arise. 

Figure 1 shows a block diagram of an example of a pitch 
detector 1 according to the invention. A speech signal 2 
is sampled with a sampling rate of 8 kHz in the sampling 
circuit 3 and the samples are divided into segments or 
frames of 160 consecutive samples. Thus, each segment 
corresponds to 2 0 ms of the speech signal. This is the 
sampling and segmentation normally used for the speech 
processing in a standard mobile telephone. 

Each segment of 160 samples is then processed in a filter 
4, which will be described in further detail below. 

First, however, the nature of speech signals will be men- 
tioned briefly. In a classical approach a speech signal 
is modelled as an output of a slowly time-varying linear 
filter. The filter is either excited by a quasi-periodic 
sequence of pulses or random noise depending on whether a 
voiced or an unvoiced sound is to be created. The pulse 
train which creates voiced sounds is produced by pressing 
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air out of the lungs through the vibrating vocal cords. 
The period of time between the pulses is called the pitch 
period and is of great importance for the singularity of 
the speech. On the other hand, unvoiced sounds are gener- 
ated by forming a constriction in the vocal tract and 
produce turbulence by forcing air through the constric- 
tion at a high velocity. This description deals with the 
detection of the pitch period of voiced sounds, and thus 
unvoiced sounds will not be further considered. 

As speech is a varying signal also the filter has to be 
time- varying. However, the properties of a speech signal 
change relatively slowly with time. It is reasonable to 
believe that the general properties of speech remain 
fixed for periods of 10-20 ms. This has led to the basic 
principle that if short segments of the speech signal are 
considered, each segment can effectively be modelled as 
having been generated by exciting a linear time -invariant 
system during that period of time. The effect of the fil- 
ter can be seen as caused by the vocal tract, the tongue, 
the mouth and the lips . 



As mentioned, voiced speech can be interpreted as the 
output signal from a linear filter driven by an excita- 

25 tion signal. This is shown in the upper part of figure 2 
in which the pulse train 21 is processed by the filter 22 
to produce the voiced speech signal 23 . A good signal for 
the detection of the pitch period is obtained if the ex- 
citation signal can be extracted from the speech. By es- 

30 timating the filter parameters A in the block 24 and then 
filtering the speech through an inverse filter 25 based 
on the estimated filter parameters, a signal 26 similar 
to the excitation signal can be obtained. This signal is 
called the residual signal. This process is shown in the 

35 lower part of figure 2. The blocks 24 and 25 are included 
in the filter 4 in figure 1. 
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The estimation of the filter parameters is based on an 
all -pole modelling which is performed by means of the 
method called linear predictive analysis (LPA) . The name 
comes from the fact that the method is equivalent with 
linear prediction. This method is well known in the art 
and will not be described in further detail here. 

The estimation of the pitch is based on the autocorrela- 
tion of the residual signal, which is obtained as de- 
scribed above. Thus, the output signal from the filter 4 
is taken to an autocorrelation calculation unit 5 . Figure 
3a shows an example of a 20 ms segment of a voiced speech 
signal and figure 3b the corresponding autocorrelation 
function of the residual signal. It will seen from figure 
3a that the actual pitch period is about 5.25 ms corre- 
sponding to 42 samples, and thus the pitch estimation 
should end up with this value. 

The next step in the estimation of the pitch is to apply 
a peak picking algorithm to the autocorrelation function 
provided by the unit 5. This is done in the peak detector 
6 which identifies the maximum peak (i.e. the largest 
value) in the autocorrelation function. The index value, 
i.e. the sample number or the lag, of the maximum peak is 
then used as a preliminary estimate of the pitch period. 
In the case shown in figure 3b it will be seen that the 
maximum peak is actually located at a lag of 42 samples. 
The search of the maximum peak is only performed in the 
range where a pitch period is likely to be located. In 
this case the range is set to 60-333 Hz. 

However, this basic pitch estimation algorithm is not al- 
ways sufficient. In some cases pitch doubling may occur, 
i.e. due to distortion the peak in the autocorrelation 
function corresponding to the true pitch period is not 
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the highest peak, but instead the highest peak appears at 
twice the pitch period. The highest peak could also ap- 
pear at other multiples of the actual pitch period (pitch 
tripling, etc.) although this occurs relatively rarely. A 
5 typical example where pitch doubling would arise is shown 
in figure 4 which again shows the autocorrelation func- 
tion of the residual signal. Here, too, the correct pitch 
period would be around 42 samples, but the peak at twice 
the pitch period, i.e. around 84 samples, is actually 
10 higher than the one at 42 samples. The basic pitch esti- 
mation algorithm would therefore estimate the pitch pe- 
riod to 84 samples and pitch doubling would thus occur. 

To avoid the problem of pitch doubling the pitch detec- 
15 tion algorithm is therefore improved as described below. 

After the preliminary pitch estimate has been determined, 
it is checked in the risk check unit 7 whether there is 
any risk of pitch doubling. All peaks with a peak value 

20 higher than 75% of the maximum peak are detected and the 
further processing depends on the result of this detec- 
tion. If only one peak is detected, i.e. the original 
maximum peak, there is no need to perform a process to 
avoid pitch doubling. In this situation the preliminary 

25 pitch estimate is used as the final pitch estimate. If, 
however, more than one peak is detected, there is a risk 
of pitch doubling and a further algorithm must be per- 
formed to ensure that the correct peak is selected as the 
pitch estimate. This is performed in the unit 8. 

30 

To identify the peak corresponding to the actual pitch 
period a modified signal is provided based on the loca- 
tion of the peaks in the autocorrelation of the residual 
signal. This modified signal, referred to as binary sig- 
35 nal, consists of only ones and zeros. The binary signal 
is set to one where the high peaks are found in the auto- 
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correlation sequence. All other values are set to zero, 
and then the autocorrelation of the binary signal is cal- 
culated. Since there are only values in some positions in 
the binary signal, the resulting autocorrelation will 
only have a few values separated from zero, and these 
values will occur around the pitch period of the signal. 
The pitch period is estimated by observing the distance 
between the indexes of the values around zero and those 
separated from zero. If the group of values separated 
from zero contains only a single value, it is selected as 
the estimate of the pitch period. If there is more than 
one value in the group, the one with the highest ampli- 
tude in the autocorrelation of the residual signal is 
chosen. 

Sometimes cases may arise where the peak at lag zero is 
the only peak present. This situation will occur when a 
peak has been split on two samples and there are no other 
high peaks in the autocorrelation of the residual signal. 
In this case the preliminary pitch estimate is chosen as 
the final pitch estimate. 

This algorithm is very simple, and therefore it is well 
suited in e.g. mobile telephones in which the computa- 
tional resources are severely limited, and a demand for a 
low- complexity algorithm is thus placed upon the system. 
The algorithm may also be implemented in an integrated 
circuit which may then be used in other types of equip- 
ment . 

Although a preferred embodiment of the present invention 
has been described and shown, the invention is not re- 
stricted to it, but may also be embodied in other ways 
within the scope of the subject-matter defined in the 
following claims. 
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Thus, the autocorrelation function may be calculated di- 
rectly of the speech signal instead of the residual sig- 
nal, or other conformity functions may be used instead of 
the autocorrelation function. As an example, a cross cor- 
relation could be calculated between the speech signal 
and the residual signal . 



10 



Further, different sampling rates and sizes of the seg- 
ments may be used. 
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Patent Claims: 



1. A method of estimating the pitch of a speech signal 
5 (2) , said method comprising the steps of: 

• sampling the speech signal to obtain a series of 
samples, 

• dividing the series of samples into segments, each 
segment having a fixed number of consecutive sam- 

10 pies, 

• calculating for each segment a conformity function 
for the signal, and 

• detecting peaks in the conformity function, 
characterized in that the method further 

15 comprises the steps of: 

• providing an intermediate signal derived from the 
speech signal, 

• converting said intermediate signal to a binary sig- 
nal, said binary signal being set to logical "1" 

20 where the intermediate signal exceeds a pre-selected 

threshold and to logical "0" where the intermediate 
signal does not exceed the pre-selected threshold, 

• calculating the autocorrelation of the binary sig- 
nal , and 

25 • using the distance between peaks in the autocorrela- 

tion of the binary signal as an estimate of the 
pitch. 

2. A method according to claim 1, character- 

3 0 i z e d in that the intermediate signal is provided by 
filtering the speech signal through a filter (4) based on 
a set of filter parameters estimated by means of linear 
predictive analysis (LPA) . 



35 



3. A method according to claim 1, character- 
ized in that the intermediate signal is provided by 
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calculating the autocorrelation of a signal derived from 
the speech signal, by filtering the speech signal through 
a filter (4) based on a set of filter parameters esti- 
mated by means of linear predictive analysis (LPA) . 

4 . A method according to any one of claims 1 to 3 , 
characterized in that it further comprises 
the step of : 

• selecting, if the peak corresponding to the distance 
between the peaks is represented by a number of sam- 
ples, the sample having the maximum amplitude of 
said conformity function as the estimate of the 
pitch. 

5. Use of the method according to any one of claims 1 to 
4 in a mobile telephone* 

6. A device adapted to estimate the pitch of a speech 
signal, and comprising: 

• means (3) for sampling the speech signal to obtain a 
series of samples, 

• means for dividing the series of samples into seg- 
ments, each segment having a fixed number of con- 
secutive samples, 

• means (5) for calculating for each segment a confor- 
mity function for the signal, and 

• means (6) for detecting peaks in the conformity 
function, 

characterized in that the device further 
comprises : 

• means for providing an intermediate signal derived 
from the speech signal, 

• means (8) for converting said intermediate signal to 
a binary signal, said binary signal being set to 
logical "1" where the intermediate signal exceeds a 
pre- selected threshold and to logical "0" where the 
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intermediate signal does not exceed the pre- selected 
threshold, 

• means (5) for calculating the autocorrelation of the 
binary signal , and 
5 • means for using the distance between peaks in the 

autocorrelation of the binary signal as an estimate 
of the pitch. 

7. A device according to claim 6, character- 
10 i z e d in that the device is adapted to provide the 
intermediate signal by filtering the speech signal 
through a filter (4) based on a set of filter parameters 
estimated by means of linear predictive analysis (LPA) . 

15 8. A device according to claim 6, character- 
ized in that the device is adapted to provide the 
intermediate signal by calculating the autocorrelation of 
a signal derived from the speech signal by filtering the 
speech signal through a filter (4) based on a set of fil- 

2 0 ter parameters estimated by means of linear predictive 

analysis (LPA) . 

9. A device according to any one of claims 6 to 8, 
characterized in that it is further adapted 
25 to select, if the peak corresponding to the distance be- 
tween the peaks is represented by a number of samples, 
the sample having the maximum amplitude of said confor- 
mity function as the estimate of the pitch. 

3 0 10. A device according to any one of claims 6 to 9, 

characterized in that the device is a mo- 
bile telephone. 

11. A device according to any one of claims 6 to 9, 
35 characterized in that the device is an in- 
tegrated circuit . 
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A method of estimating the pitch of a speech signal using 
a binary signal, use of the method, and a device adapted 
therefor 



ABSTRACT 

A method of estimating the pitch of a speech signal (2) 
comprises the steps of sampling the speech signal to ob- 
tain a series of samples, dividing the series of samples 
into segments, each segment having a fixed number of con- 
secutive samples, calculating for each segment a confor- 
mity function, and detecting peaks in the conformity 
function. The method further comprises the steps of pro- 
viding an intermediate signal derived from the speech 
signal, converting the intermediate signal to a binary 
signal, which is set to logical "1" where the intermedi- 
ate signal exceeds a pre-selected threshold and to logi- 
cal "0" where the intermediate signal does not exceed the 
pre-selected threshold, calculating the autocorrelation 
of the binary signal, and using the distance between 
peaks in the autocorrelation of the binary signal as an 
estimate of the pitch. The large amount of operations 
needed in prior art algorithms is thus avoided. A similar 
device is also provided. 

Fig. 1 should be published. 
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Fig. 3b 
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Fig. 4 



